[fix](mv) Invalidate rewrite cache on constraint changes#62530
[fix](mv) Invalidate rewrite cache on constraint changes#62530foxtail463 wants to merge 2 commits intoapache:masterfrom
Conversation
|
/review |
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Blocking finding:
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/DropConstraintCommand.javaandfe/fe-core/src/main/java/org/apache/doris/persist/EditLog.java
TheDROP CONSTRAINTflow removes the constraint fromConstraintManagerbefore invalidating dependent MTMV rewrite caches.MTMV.getOrGenerateCache()returns any existing cache immediately, so a concurrent planner in that window can still reuse a cache built with the old PK/FK/UK metadata and accept a rewrite that should already be disallowed. This affects both the live DDL path and journal replay.
Critical checkpoint conclusions:
- Goal of the task: Partially accomplished. The PR correctly identifies that constraint changes must invalidate dependent MTMV rewrite caches, and it adds coverage for cache rebuild/in-flight publication, but
DROP CONSTRAINTstill leaves a stale-cache visibility window. - Small, clear, focused change: Yes. The patch is scoped to cache invalidation around constraint changes plus tests.
- Concurrency: The new generation counter in
MTMVis a good fix for in-flight cache publication, but cache invalidation is not atomic with the metadata change on the drop path, so concurrent query planning can still observe stale rewrite metadata. I did not find a new lock-order or deadlock issue in the added code. - Lifecycle / static initialization: No special lifecycle or static-init issue found in the touched code.
- Configuration changes: None.
- Incompatible changes / compatibility: No FE-BE protocol or storage-format compatibility issue introduced. Edit-log write/replay coverage exists, but replay inherits the same stale-cache ordering bug.
- Parallel code paths: The live command path and replay path were both updated, and both contain the same ordering problem.
- Special conditional checks: The new null/unknown-constraint checks are reasonable.
- Test coverage: Improved with FE unit tests and a regression test, but there is still no test covering the concurrent
DROP CONSTRAINTstale-cache window. - Test result modifications: Added regression case looks aligned with MTMV behavior expectations; I did not execute the suite in this runner.
- Observability: No additional observability appears necessary for this change.
- Transaction / persistence: Metadata write and replay paths are both handled, but replay currently does not preserve the intended invalidation semantics under stale cached MTMVs.
- Data writes / atomicity: This is metadata-only, but the cache invalidation and constraint-state transition are not atomically visible to readers on
DROP CONSTRAINT. - FE/BE variable passing: Not applicable.
- Performance: No obvious new hot-path performance issue.
- Other issues: None beyond the blocking correctness issue above.
|
run buildall |
FE Regression Coverage ReportIncrement line coverage |
|
run feut |
|
run feut |
|
run external |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
/review |
|
OpenCode automated review failed and did not complete. Error: Review step was failure (possibly timeout or cancelled) Please inspect the workflow logs and rerun the review after the underlying issue is resolved. |
|
/review |
|
OpenCode automated review failed and did not complete. Error: Review step was failure (possibly timeout or cancelled) Please inspect the workflow logs and rerun the review after the underlying issue is resolved. |
|
/review |
|
OpenCode automated review failed and did not complete. Error: Review step was failure (possibly timeout or cancelled) Please inspect the workflow logs and rerun the review after the underlying issue is resolved. |
|
/review |
|
OpenCode automated review failed and did not complete. Error: Review step was failure (possibly timeout or cancelled) Please inspect the workflow logs and rerun the review after the underlying issue is resolved. |
|
/review |
|
OpenCode automated review failed and did not complete. Error: Review step was failure (possibly timeout or cancelled) Please inspect the workflow logs and rerun the review after the underlying issue is resolved. |
What problem does this PR solve?
Problem Summary:
Constraint changes can change MV rewrite eligibility, especially for PK/FK/UK-based reasoning.
Example:
Without the relevant PK/FK metadata, the rewrite may fail.
After:
the same query may become rewritable because the optimizer can prove the join relationship through constraints.
After dropping either the FK or the referenced PK, that reasoning is no longer valid, so the rewrite should stop again.
This patch invalidates dependent MTMV rewrite caches after ADD/DROP CONSTRAINT. The invalidation is driven by the analyzed table name and affected base-table infos, so it does not depend on rewrite cache contents that may have been built from old constraint metadata.
It also prevents an in-flight rewrite cache built before the constraint change from being published after invalidation, avoiding stale PK/FK/UK metadata from being reused by later rewrites.